Thompson Sampling for Linear-Quadratic Control Problems
نویسندگان
چکیده
We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of O(T 2/3), which is significantly worse than the regret O( p T ) achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.
منابع مشابه
Haar Matrix Equations for Solving Time-Variant Linear-Quadratic Optimal Control Problems
In this paper, Haar wavelets are performed for solving continuous time-variant linear-quadratic optimal control problems. Firstly, using necessary conditions for optimality, the problem is changed into a two-boundary value problem (TBVP). Next, Haar wavelets are applied for converting the TBVP, as a system of differential equations, in to a system of matrix algebraic equations...
متن کاملLinear-quadratic optimal sampled-data control problems: Convergence result and Riccati theory
We consider a general linear control system and a general quadratic cost, where the state evolves continuously in time and the control is sampled, i.e., is piecewise constant over a subdivision of the time interval. This is the framework of a linear-quadratic optimal sampleddata control problem. As a first result, we prove that, as the sampling periods tend to zero, the optimal sampled-data con...
متن کاملAN OPTIMAL FUZZY SLIDING MODE CONTROLLER DESIGN BASED ON PARTICLE SWARM OPTIMIZATION AND USING SCALAR SIGN FUNCTION
This paper addresses the problems caused by an inappropriate selection of sliding surface parameters in fuzzy sliding mode controllers via an optimization approach. In particular, the proposed method employs the parallel distributed compensator scheme to design the state feedback based control law. The controller gains are determined in offline mode via a linear quadratic regular. The particle ...
متن کاملA NEW APPROACH FOR SOLVING FULLY FUZZY QUADRATIC PROGRAMMING PROBLEMS
Quadratic programming (QP) is an optimization problem wherein one minimizes (or maximizes) a quadratic function of a finite number of decision variable subject to a finite number of linear inequality and/ or equality constraints. In this paper, a quadratic programming problem (FFQP) is considered in which all cost coefficients, constraints coefficients, and right hand side are characterized by ...
متن کاملLinear Thompson Sampling Revisited
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order e O(d3/2 p T ) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and h...
متن کامل